Temporal structure constrained transformation for speaker adaptation

نویسندگان

  • Eric H. C. Choi
  • Trym Holter
  • Julien Epps
  • Arun Gopalakrishnan
چکیده

In this paper we suggest that rather than modeling speaker mismatch as an affine transform of the entire feature vector, it can be modeled by an affine transform of the static coefficients with additional constraints imposed by the temporal relationships of the streams of coefficients. This results in the different streams sharing the same rotation matrix, and thus reduces the complexity and memory requirements for speaker adaptation, as well as minimizes the adaptation data requirements. We present the solution for the case where temporal StNCtUIe constrained transforms (TSCT) are optimized using the maximum likelihood criterion. The experiments presented in the paper show that with the proposed approach, the same accuracy after adaptation for the Wall Street Journal (WSJ) task can be achieved by using only 60% of the total number of transformation parameters that it would require if conventional block-diagonal transformation is used. In addition, TSCT provides better recognition accuracy when there is only a very limited amount of adaptation data.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Constrained Structural Maximum A for Average-Voice-Based

This paper proposes a constrained structural maximum a posteriori linear regression (CSMAPLR) algorithm for further improvement of speaker adaptation performance in HMM-based speech synthesis. In the algorithm, the concept of structural maximum a posteriori (SMAP) adaptation is applied to estimation of transformation matrices of the constrained MLLR (CMLLR), where recursive MAP-based estimation...

متن کامل

Comparison of discriminative input and output transformations for speaker adaptation in the hybrid NN/HMM systems

Speaker variability is one of the major error sources for ASR systems. Speaker adaptation estimates speaker specific models from the speaker independent ones to minimize the mismatch between the training and testing conditions arisen from speaker variabilities. One of the commonly adopted approaches is the transformation based method. In this paper, the discriminative input and output transform...

متن کامل

Exploitation of Feature Vector Structure for Speaker Adaptation

In this paper we suggest that rather than modelling speaker mismatch as an affine transform of the entire feature vector, it can be modelled by an affine transform of the static coefficients with additional constraints imposed by the temporal relationships of the streams of coefficients. This results in the different streams sharing the same rotation matrix, and thus reduces the complexity and ...

متن کامل

Linear feature space projections for speaker adaptation

We extend the well-known technique of constrained Maximum Likelihood Linear Regression (MLLR) to compute a projection (instead of a full rank transformation) on the feature vectors of the adaptation data. We model the projected features with phone-dependent Gaussian distributions and also model the complement of the projected space with a single class-independent, speaker-specific Gaussian dist...

متن کامل

A basis representation of constrained MLLR transforms for robust adaptation

Constrained Maximum Likelihood Linear Regression (CMLLR) is a speaker adaptation method for speech recognition that can be realized as a featurespace transformation. In its original form it does not work well when the amount of speech available for adaptation is less than about five seconds, because of the difficulty of robustly estimating the parameters of the transformation matrix. In this pa...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003